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Abstract 

The Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging virus that poses a 
major challenge to clinical management. 

The 3C-like protease (3CL pro ) is essential for viral replication and thus represents a potential 
target for antiviral drug development. Presently, very few data are available on MERS-CoV 3CL pro 
inhibition by small molecules. We conducted extensive exploration of the pharmacophoric space 
of a recently identified set of peptidomimetic inhibitors of the bat HKU4-CoV 3CL pro . HKU4-CoV 
3CL pro shares high sequence identity (81%) with the MERS-CoV enzyme and thus represents a 
potential surrogate model for anti-MERS drug discovery. We used 2 well-established methods: 
Quantitative structure-activity relationship (QSAR)-guided modeling and docking-based compar¬ 
ative intermolecular contacts analysis. The established pharmacophore models highlight struc¬ 
tural features needed for ligand recognition and revealed important binding-pocket regions 
involved in 3CL pro -ligand interactions. The best models were used as 3D queries to screen the 
National Cancer Institute database for novel nonpeptidomimetic 3CL pro inhibitors. The identified 
hits were tested for HKU4-CoV and MERS-CoV 3CL pro inhibition. Two hits, which share the 
phenylsulfonamide fragment, showed moderate inhibitory activity against the MERS-CoV 3CL pro 
and represent a potential starting point for the development of novel anti-MERS agents. To the 
best of our knowledge, this is the first pharmacophore modeling study supported by in vitro val¬ 
idation on the MERS-CoV 3CL pro . 

Highlights: 

• MERS-CoV is an emerging virus that is closely related to the bat HKU4-CoV. 

• 3CL pro is a potential drug target for coronavirus infection. 

• HKU4-CoV 3CL pro is a useful surrogate model for the identification of MERS-CoV 3CL pro 
enzyme inhibitors. 

• dbCICA is a very robust modeling method for hit identification. 

• The phenylsulfonamide scaffold represents a potential starting point for MERS coronavirus 
3CL pro inhibitors development. 
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1 I INTRODUCTION 


Middle East respiratory syndrome coronavirus (MERS-CoV; HCoV- 
EMC/2012) is an emerging virus that causes severe pneumonia illness 
and exhibits a high mortality rate. 1 The first known human MERS-CoV 
cases occurred in Jordan in 2012, before the causative virus was 
detected and identified later during the same year in Saudi Arabia. 2,3 
Since then, over 1900 laboratory-confirmed cases have been reported 
to the WHO in 27 countries across the world. 4 

MERS-CoV is an enveloped virus carrying a genome of positive 
sense RNA. 5 The virus, which is considered primarily as a zoonotic 
virus, belongs to the lineage C of Betacoronavirus, thus is closely 
related to the bat coronaviruses HKU4 and HKU5. 6-8 Several studies 
have shown that bats and camels are the most likely animal reservoir 
of MERS-CoV. 9-11 Accumulating evidence points to virus transmission 
from dromedary camels to humans. 12,13 

As the case with many viral diseases, effective therapy against 
MERS is lacking and supportive care is the only available treatment 
option. Attempts to develop an effective vaccine against 
MERS-CoV infection have led to promising results but are still in 
early stages. 14-16 The high morbidity and mortality rates of 
MERS-CoV as well as its potential to cause epidemics highlight 
the need for novel drug discovery to develop effective and safe 
anti-MERS-CoV therapeutics. 

Several efforts have been undertaken to identify selective potent 
small molecules with anti-MERS-CoV activity. 17-21 Promising 
compounds were identified via screening of FDA-approved drugs and 
drug-like small molecules using cell-based systems and in vitro 

17-24 

screening. 

Targets homologous to those identified in the severe acute 
respiratory syndrome coronavirus (SARS-CoV) were investigated in 
MERS-CoV (reviewed in Hilgenfeld and Peiris 25 ). 26-29 Among these, 
MERS-CoV main proteinase, also known as 3-chymotrypsin-like 
protease (3CL pro ), is considered an important potential target due to 
its essential role in the viral life cycle. 26,29 The coronavirus genome 
encodes an 800-kDa replicase polyprotein, which is processed by the 
3CL pr ° to yield intermediate and mature nonstructural proteins 
responsible for many aspects of virus replication. 5,30,31 The enzyme 
started to attract interest as a target for anti-MERS-CoV drug 
development. However, data on the enzyme inhibition are scarce. 
The SARS-CoV 3CL pro has been comprehensively explored as a drug 
target, and many potent enzyme inhibitors have been identi¬ 
fied. 1,25,32,33 Elaborated structure- and ligand-based in sitico models 
obtained using the SAR-CoV 3CL pro inhibitors proved fruitless for the 
identification of MERS-CoV 3CL pro inhibitors (modeling studies 
conducted by our group, data not published). Interestingly, the 3CL pro 
enzymes from different CoV strains are known to share significant 
sequence and 3D structure homology providing a strong structural 
basis for designing wide-spectrum anti-CoV inhibitors. 34,35 Sequence 
alignment studies showed that the active site residues of the 
HKU4-CoV 3CL pro that participated in inhibitor binding are conserved 
in the MERS-CoV 3CL pro , which has 81.0% sequence identity 36 to 
HKU4-CoV 3CL pro (Figure 1). Therefore, the bat HKU4-CoV 3CL pro 
has been investigated as a surrogate model for anti-MERS 
development. 36 Novel peptidomimetic inhibitors of MERS-CoV 3CL pro 
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have been identified by using the enzyme from HKU4-CoV as a 
model. 36 

In this study, we used the set of peptidomimetic HKU4-CoV 
3CL pro inhibitors reported in St. John et al 36 to conduct extensive 
computational modeling studies. These modeling efforts aim at 
establishing pharmacophore models to be used as 3D search queries 
for virtual screening of potential MERS-CoV 3CL pro inhibitors. The 
methods used here were developed previously by our group: the 
QSAR-guided pharmacophore modeling 37,38 and the docking-based 
comparative intermolecular contacts analysis (dbCICA) pharmacophore 
modeling. 39,40 Both modeling approaches have been used successfully 
to identify potent inhibitors against several drug targets. 37-41 The 
identified hits were tested in vitro for their inhibitory activity against 
the 3CL pro enzymes from HKU4-CoV and MERS-CoV. 


2 I MATERIAL AND METHODS 

All chemicals and reagents were purchased from Sigma-Aldrich 
(United States), unless otherwise stated. 

2.1 I QSAR-guided pharmacophore modeling 

2.1.1 I Data preparation and pharmacophore exploration 

The structures and biological data of 221 previously identified 
HKU4-CoV 3CL pro inhibitors reported in St. John et al 36 ( 1 - 221 , 
Table SI) were used in modeling. 

The bioactivities of these inhibitors were expressed as the 
concentration of the test compound that inhibited the activity of 
HKU4-CoV 3CL pro by 50% (IC 50 , pM). In cases of unavailable IC 50 
values (ie, 20-25 and 48 - 221 , Table SI), the corresponding IC 50 
estimates were extrapolated based on reported inhibitory percentages 
at lOOpM assuming linear dose-response relationships. The logarithms 
of measured IC 50 (pM) values were used in QSAR-guided 
pharmacophore modeling to correlate bioactivity data linearly to free 
energy change. Chiral centers with unknown configuration were 
marked as “unknown” so that the inversion these chiral centers is 
sampled during conformation generation. 

These compounds were used to explore the pharmacophoric 
space of HKU4-CoV 3CL pro through a series of established modeling 
steps as has been described previously. 38,42-46 The modeling workflow 
is detailed in Sections SI to S5. 

2.1.2 I QSAR modeling 

QSAR-guided selection of optimal pharmacophores was conducted 
to find an optimal combination of pharmacophore models capable 
of explaining bioactivity variation across the whole set of collected 
training compounds ( 1 - 221 , Table SI). 36 QSAR modeling was done 
using the genetic function algorithm (GFA) to generate combinations 
of descriptors (physicochemical and pharmacophores) (Sections S6 
and S7). Subsequently, multiple linear regression (MLR) analyses 
were used to assess the qualities of selected descriptor combina¬ 
tions, ie, to explain bioactivity variations within collected inhibitors. 
This QSAR modeling was performed using a training set of 177 
compounds of the total set of HKU4-CoV 3CL pro inhibitors and 
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FIGURE 1 Comparison of the binding site of 3CL pro from HKU4-CoV and MERS-CoV. (A) A ribbon presentation of the superimposition of the 
HKU4-CoV 3CL pro complex with a potent inhibitor (blue ribbons and green carbon atoms, 1.8 A, PDB code 4YOI) and the MERS-CoV enzyme 
(red ribbons and gray carbon atoms, 2.1 A, PDB code 4YLU), showing the high similarity in protein folding and a close-up view of the main residues 
interacting with inhibitors in HKU4-CoV and MERS-CoV 3CL pro binding pockets. The figure was prepared using the DS visualizer. (B) Amino acid 
sequence alignment of the 3CL pro from HKU4-CoV and MERS-CoV enzyme. The sequence alignment was generated by using Clustal Omega. 
Residues strictly conserved have a red background; similar residues are indicated by black bold letters with a yellow background according to a 
Risler matrix implemented in ESPript. The symbols above the sequence correspond to the secondary structure of MERS-CoV3CL pr ° (PDB code 
4YLU; Tomar et al 30 ). The blue stars indicate residues in the binding pocket the enzymes. MERS-CoV, Middle East respiratory syndrome 
coronavirus; PDB, Protein Data Bank 


validated using leave-one-out r 2 (/^loo) and predictive r 2 (impress) 
against a randomly selected testing set of 44 inhibitors as described 
in Sections S6, S7, and S8. The test set was selected by ranking the 
total 221 inhibitors according to their IC 50 values, and then every 
fifth compound was selected for the testing set starting from the 
high-potency end. 

2.2 I Docking-based comparative intermolecular 
contacts analysis 

Docking studies were performed using a subset of 27 compounds of 
the peptidomimetic HKU4-CoV 3CL pro inhibitors with known 
(absolute) stereochemistries (1-27, Table SI). The 3D coordinates of 
HKU4-CoV 3CL pro were retrieved from the Protein Data Bank (PDB 
code: 4YOI, 1.8 A). 36 The protein structure was modified by adding 
hydrogen atoms and Gasteiger-Marsili charges to the protein atoms 
using the Discovery Studio (version 2.5.5; Accelrys Inc, San Diego). It 
was then used in subsequent docking experiments without energy 
minimization. 

Docking was conducted using both LibDock 47 and CDOCKER. 48 
LibDock is a site-feature docking algorithm that docks ligands (after 
removing hydrogen atoms) into an active site guided by binding 
hotspots. 47 While, CDOCKER is a CHARMm-based simulated 


annealing/molecular dynamics method that implements simulated 
annealing to search for the most stable docked ligand poses 48 These 
docking engines consider the flexibility of the ligand while treat the 
receptor as rigid. Details of each docking engine and the corresponding 
docking settings are described in Sections S9 to S10. The highest- 
ranking docked conformers/poses were scored using 7 scoring 
functions: Jain, LigScorel, LigScore2, PLP1, PLP2, PMF, and PMF04 
(Section Sll). 49 ' 53 The docking-scoring cycles using both engines were 
repeated to cover all possible docking combinations resulting from the 
presence (or absence) of crystallographically explicit water molecules 
within the binding site. 

Taking into account each scoring function in turn, the highest 
scoring docked conformer/pose of each inhibitor was chosen to be 
used in subsequent comparative intermolecular contacts analysis 
(dbCICA) modeling. 39,40 This step resulted in 7 docking/scoring 
combinations of the 27 compounds each of them scored with a 
corresponding scoring function. The docking and scoring cycle was 
repeated 2 times to cover all combinations of docking conditions, ie, 
the presence or absence of explicit water molecules. The resulting 14 
docking/scoring sets were used in dbCICA modeling as described 
previously 39,40 Sections S12 to S13 describe details of dbCICA 
modeling. Successful dbCICA models were used to guide the manual 
building of pharmacophores (Section S14). 
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2.3 I Validation and steric refinement of 
pharmacophore models 


Optimal pharmacophores (both structure and ligand based) were 
validated using the receiver operating characteristic (ROC) curve 
analysis to assess the ability of each model to correctly classify a group 
of compounds into actives and inactives (Section S15). 39,40,54 
Matthews correlation coefficient (MCC) was also undertaken as an 
additional validation. 55 Additionally, exclusion spheres were added 
using HIPHOP-REFINE module of Discovery Studio to improve the 
ROC properties of QSAR-guided pharmacophore (Section S8). 


2.4 I Virtual screening for new HKU4-CoV 3CL pro 
inhibitors 

The selected pharmacophores were used as 3D search queries to 
screen the National Cancer Institute (NCI) database 56 for new 3CL pro 
inhibitors. 

Hits captured by the QSAR-guided pharmacophore were filtered 
by the Lipinski criteria to ensure good pharmacokinetic properties 57 
and the SMILES arbitrary target specification (SMARTS) filter (Section 
S16) to remove reactive ligands (ie, alkyl halides or Michael 
acceptors). 58 Remaining hits were fitted against the corresponding 
individual pharmacophores. The fit values were then substituted in 
the MLR-based QSAR models to predict hits' bioactivities (-log(IC 50 )). 
The highest-ranking hits were selected for in vitro testing using a 
voting system to minimize the influence of QSAR-based predictions 
on hit prioritization. In this system, each hit fit value and the hit's 
overall QSAR predictions cast a vote of “one” if the value is within 
the top 20% of all captured hits, otherwise the vote is “zero.” 

Similarly, hits captured from all successful dbCICA-derived 
pharmacophores were pooled together and filtered according to the 
Lipinski criteria 57 and SMARTS filter. 58 The hits were then docked into 
HKU4-CoV 3CL pro binding pocket (4YOI) using the same docking/ 
scoring conditions of each successful dbCICA model. The resulting 
docked poses were then analyzed for critical contacts (according to 
successful dbCICA models), and the sums of critical contacts for each 
hit compound were used for the prediction of their corresponding 
IC 50 values. The highest-ranking hits were selected for in vitro testing 
using a similar voting system to that described above: Each docking 
solution casts a vote of “one” if the predicted value is within the top 
10% of all captured hits, otherwise it casts a vote of “zero.” 


2.5 I Protein expression and purification 

MERS-CoV 3CL pro was expressed through auto-induction in 
Escherichia coli BL21-DE3 cells in the presence of 100 pg/mL of 
carbenicillin as described previously. 30,59 Cells were harvested by 
centrifugation at 5000g for 20 minutes at 4°C, and the pellets were 
stored at -80°C until further use. MERS-CoV 3CL pro purification was 
performed using consecutive steps of hydrophobic-interaction 
chromatography, DEAE anion-exchange chromatography, Mono S 
cation-exchange chromatography, and size-exclusion chromatography 
as described previously. 30 HKU4-CoV 3CL pro was produced and 
purified using a modified protocol from Agnihothram et al. 60 Final 
protein yield was calculated based on the measurement of total 
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activity units (pM product/min), specific activity (units/mg), and 
milligrams of protein obtained (BioRad protein assay) after each 
chromatographic step. 

2.6 I Inhibition assays 

Inhibition assays were conducted as described previously. 36 Each of 
the acquired hits was screened for inhibition of HKU4 3CL pro and 
MERS 3CL pro at a concentration of 40pM in duplicate assays 
containing the following assay buffer (50mM HEPES, 0.1 mg/mL 
BSA, 0.01% TritonX-100, 2mM DTT). Compound 1 (the most potent 
compound in the training set; Table SI; St. John et a | 36,table 1A ) was 
used as a positive control. The assays were conducted in Costar 
3694 EIA/RIA 96-Well Half Area, Flat Bottom, Black Polystyrene 
plates (Corning, New York). A total of 1 pL of 100X inhibitor stock 
in dimethyl sulfoxide (DMSO) was added to 79 pL of enzyme in 
assay buffer, and the enzyme-inhibitor mixture was incubated for 
5 minutes. The reaction was initiated by the addition of 20 pL of 
lOpM UIVT3 substrate, a custom synthesized Forster resonance 
energy transfer substrate peptide with the following sequence; 
HilyteFluor 488-ESATLQSGLRKAK-QXL520-NH 2 , producing final 
concentrations of 250nM HKU4-CoV 3CL pro , 500nM MERS-CoV 
3CL pro , and lOOpM UIVT3 substrate. The fluorescence intensity of 
the reaction was then measured over time as relative fluorescence 
units (RFU t ) for a period of 10 minutes, using an excitation wave¬ 
length of 485 nm and bandwidth of 20 nm and monitoring emission 
at 528 nm and bandwidth of 20 nm using a BioTek Synergy HI mul¬ 
timode microplate reader. The inhibition of the HKU4-CoV 3CL pro 
and MERS-CoV 3CL pro by hit compounds was monitored by follow¬ 
ing the change in RFUs over time, using the initial slope of the prog¬ 
ress curve to determine the initial rate (V,). The percent inhibition of 
each 3CL pro enzyme was determined using the following equation: 


%lnhibition = 


1- 


^ lnhibited3CL Pro RFU/s-BackgroundRFU /s^ 
(Uninhibited3CL Pr ° RFU/s-BackgroundRFU /s) 


xl00. 

( 1 ) 


The IC 50 values were determined at an ambient temperature from 
100-pL assays performed in triplicate in the following buffer: 50mM 
HEPES, 0.1 mg/mL BSA, 0.01% TritonX-100,2mM DTT. Kinetic assays 
were conducted in Costar 3694 EIA/RIA 96-Well Half Area, Flat 
Bottom, Black Polystyrene plates (Corning, NY). Each inhibitor was 
tested at concentrations ranging from 2.5pM to 400pM. A total of 
1 pL of 100X inhibitor stock in DMSO was added to 79 pL of enzyme 
in assay buffer, and the enzyme-inhibitor mixture was incubated for 
5 minutes. The reaction was initiated by the addition of 20 pL of 
lOpM UIVT3 substrate, producing final concentrations of 250nM 
HKU4-CoV 3CL pro , 500nM MERS-CoV 3CL pro , and 2pM UIVT3 
substrate. The fluorescence intensity of the reaction was then 
measured over time as RFU t for a period of 10 minutes, using an 
excitation wavelength of 485 nm and bandwidth of 20 nm and 
monitoring emission at 528 nm and bandwidth of 20 nm using a BioTek 
Synergy HI multimode microplate reader. The percent inhibition of the 
3CL pro enzymes was then plotted as a function of inhibitor concentra¬ 
tion. The SigmaPlot Enzyme Kinetics Wizard was used to fit the 
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triplicate percent inhibition data and associated standard error to a 
nonlinear Michaelis-Menten type regression model and determine 
the IC 50 for each enzyme using the following equation: 


%lnhibition = 


'%l ma x x [lnhibitor}' 
1C 50 + [ Inhibitor] 


( 2 ) 


where %l max is the percent maximum inhibition of 3CL pro and the error 
in IC 50 values was determined as the error in the fitted parameter. 

Controls were performed, in which the enzyme, the substrate, or 
both was/were omitted. Fluorescence attenuation controls were 
carried by adding the inhibitors to the cleaved substrate in a reaction 
mixture identical to that used in the inhibition assays. 
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The fit values obtained by mapping the 68 representative 
pharmacophores against the HKU4-CoV 3CL pro inhibitors were 
enrolled together with a selection of 2D descriptors as independent 
variables in QSAR analysis. 

Genetic function algorithm combined with MLR analyses was used 
to select different combinations of pharmacophores and 2D molecular 
descriptors that are capable of explaining bioactivity variation among 
collected inhibitors. 

However, all attempts to achieve statistically successful QSAR 
models failed, prompting the use of ligand efficiency [LE = -log(IC 50 )/ 
heavy atom count] as an alternative response variable instead of 
-log(IC 50 ). 61 ~ 64 The best QSAR models are summarized in Equations 3 
and 4. Figure 2A, B show the corresponding scatter plots of experi¬ 
mental versus estimated bioactivities for training and testing inhibitors. 


3 I RESULTS AND DISCUSSION 

3.1 I Ligand-based approach: QSAR-guided 
pharmacophore modeling 

The pharmacophoric space of 221 HKU4-CoV 3CL pro inhibitors was 
extensively explored through 112 HYPOGEN automatic runs per¬ 
formed on 14 carefully selected training subsets comprising 14 to 
22 compounds (Section 2.1 and Tables SI and S2). The training 
compounds in each subset were selected in such a way to ensure 
that each set represent a common binding mode and guarantee that 
bioactivities differences among its members are attributable to the 
presence or absence of pharmacophoric features. Applying this 
strategy allows an effective exploration of the pharmacophoric 
space of HKU4-CoV 3CL pro inhibitors and helps to identify 
pharmacophoric hypotheses representing all possible binding modes 
assumed by 3CL pro . 38 ' 42 ' 46 These runs resulted in 677 successful 
pharmacophore models, which were then clustered using the hierar¬ 
chical average linkage method available in CATALYST. The best 68 
representative models were used in subsequent QSAR modeling 
(Section 2.1). 


LE = -0.12 + 1.98x10 3 (AromaticBonds) + 5.95x10 4 (D/po/e) 
-1.22 x10 _3 (D ipoleX) -6.64x10 ~ 4 (DipoleY) 

-9.7x10 ~ 2 (LUMO) + 2.22xl0 _ 3 [Hypo(K-T5-3)] 
+4.73xl0 _ 3 [Hypo(L-T5-2)] 

n = 177, r 2 = 0.637, F -statistic = 42.408, r\oo = 0.572, r 2 PRE5 s = 0.675. 

(3) 

LE = -0.11+ 1.99xl0 _ 3 (Aromat/c8onds)-9.53xl0 _ 4 (D/po/eX) 
-6.58xl0 _4 (Dipo/eY)-9.30xl0 _2 (LUMO) 

+4.89xl0 -3 [Hypo(L-T5-2)+2.39xl0 _ 3 Hypo(N-Tl-l)J 
n = 177, r 2 = 0.625. F -statistic = 47.298, r 2 LOO = 0.584, r 2 PRESS = 0.647. 

(4) 

where n is the number of training compounds used to generate this 
equation, F is Fisher statistic, r 2 LO o is the leave-one-out 
cross-validation correlation coefficient, and impress is the predictive r 2 
determined for 44 randomly selected test compounds. AromaticBonds 
is the number of aromatic bonds in the molecule, Dipole, DipoleX, 
and DipoleY are dipole moment descriptors that indicate the 
strength and orientation behavior of a molecule in an electrostatic 
field, LUMO is the energy of the lowest unoccupied molecular 
orbital , 65 Hypo(L-T5-2), Hypo(K-T5-3), and Hypo(N-Tl-l) represent 
the fit values of the training compounds against corresponding 



FIGURE 2 Experimental versus predicted bioactivities for the training and testing compounds. Predicted bioactivities calculated using the best 
QSAR models: (A) Equation 3 and (B) Equation 4. The solid line is the regression line for the fitted and predicted bioactivities of training and 
test compounds, respectively, whereas the dotted lines indicate arbitrary error margins. 
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pharmacophores (seeTable S3). Figure 3 shows the 3 pharmacophores 
and how they fit the most potent training compound 
(1, IC 50 = 0.33pM 36 ). 

The appearance of AromaticBonds descriptor combined with 
positive slopes in both QSAR equations indicates that HKU4-CoV 
3CL pro inhibitory activity is directly proportional to the number of 
aromatic rings in the inhibitor structure. This is to be expected, as 
the binding pocket is rich in aromatic amino acids (His41, Hisl66, 
Hisl75, Tyr54, and Phel43). Apparently, ligands' aromatic rings stack 
against these aromatic residues in the binding pocket is likely to lead 
to a high binding affinity. However, the emergence of several dipole 
moment descriptors (Dipole, DipoleX, and DipoleY) combined with 
positive and negative regression coefficients in Equations 3 and 4 is 
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suggestive of an obscure role of ligands' dipole moments in binding 
within the enzyme-binding pocket. 

The emergence of LUMO in Equations 3 and 4 combined with 
negative slopes suggests that ligand/HKU4-CoV 3CL pro affinity favors 
electrophilic ligands, perhaps due to a n-stacking against certain 
electron-rich aromatic centers in the binding pocket (eg, the aromatic 
rings of Tyr54 and Phel43). 

The emergence of 3 pharmacophores—Hypo(K-T5-3), Hypo 
(N-Tl-1), and Hypo(L-T5-2)—in Equations 3 and 4 suggests possible 
multiple or complementary binding modes exhibited by ligands within 
the binding pocket. Receiver operating characteristic analysis of the 3 
pharmacophores shows that Hypo(K-T5-3) and Hypo(N-Tl-l) are sig¬ 
nificantly superior to Hypo(L-T5-2) (Table 1). Furthermore, MCC of the 



FIGURE 3 Pharmacophoric features of the QSAR-guided pharmacophores and the corresponding merged model: green-vectored spheres: HBA; 
blue spheres: Hbic; purple-vectored spheres: HBD; and orange-vectored spheres: RingArom, (A) Hypo(N-Tl-l), (B) Hypo(K-T5-3), (C) Merged- 
Hypo(K-T5-3/N-Tl-l), (D) Refined Merged-Hypo(K-T5-3/N-Tl-l), and (E) Hypo(L-T5-2) fitted against co-crystallized ligand within HKU4-CoV 
3CL pro (compound 1, IC 50 = 0.33pM, PDB code 4YOI, 1.8 A). (F) Ligand co-crystallized within HKU4-CoV 3CL pro and the chemical structure of the 
co-crystallized ligand. Arrows point to closely positioned common features in Hypo(N-Tl-l) and Hypo(K-T5-3) allowing for merging. The 3D 
coordinates of these pharmacophores are shown in Table S6. HBA, hydrogen bond acceptor; HBD, hydrogen bond donor 
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TABLE 1 ROC and MCC performances of QSAR-guided 
pharmacophores 


Pharmacophore Model 

ROC-AUC 

ACC 

SPC 

TPR 

MCC 

Hypo(L-T5-2) 

0.78 

0.09 

0.05 

1.00 

0.048 

Hypo(K-T5-3) 

0.78 

0.52 

0.50 

0.74 

0.099 

Hypo(N-Tl-l) 

0.81 

0.63 

0.63 

0.63 

0.109 

Hypo(K-T5-3/N-Tl-l) 

0.93 

0.88 

0.90 

0.52 

0.263 

Refined Hypo(K-T5-3/N-Tl-l) 

0.94 

0.89 

0.91 

0.48 

0.262 


Abbreviations: ACC, overall accuracy; AUC, area under the curve; MCC, 
Matthews correlation coefficient; ROC, receiver operating characteristic; 
SPC, overall specificity; TPR, overall true positive rate. 


3 pharmacophores reflects the very weak classification abilities of 
Hypo(L-T5-2) (Table 1). 

The very poor classification power of Hypo(L-T5-2) prompted us 
to exclude it from subsequent modeling efforts. However, Hypo(K- 
T5-3) and Hypo(N-Tl-l) (Figure 3A,B) have 3 pharmacophoric features 
in common: hydrophobic (Hbic), ring aromatic (RingArom), and hydro¬ 
gen bond acceptor (HBA) features. The close resemblance between 
these 2 pharmacophores combined with their equivalent contributions 
to bioactivity (as indicated by their slopes in QSAR Equations 3 and 4) 
suggest that they might represent a common binding mode assumed 
by ligands within the HKU4-CoV 3CL pro binding pocket. Therefore, 
these 2 pharmacophores were merged in a single binding model 
(Hypo(K-T5-3/N-T 1-1) (Figure 3). 

Interestingly, Hypo(K-T5-3/N-Tl-l) showed noticeable improve¬ 
ment in distinguishing actives from decoys as indicated by the ROC 
analysis and MCC values (Table 1). Merging pharmacophores that 
share common features has been reported to improve the perfor¬ 
mance of pharmacophores in capturing active molecules. 66 Addition¬ 
ally, Hypo(K-T5-3/N-Tl-l) was further modified by adding exclusion 
spheres (Section S8 and Table S6) to further enhance its ROC profile 
(Table 1). Exclusion volumes resemble inaccessible regions within the 
binding site. Figure 3D shows the sterically refined version of 
Hypo(K-T5-3/N-Tl-l) complemented with eight exclusion volumes. 

Moreover, Hypo(K-T5-3/N-Tl-l) maps the most potent ligand 1 
(IC 50 = 0.33pM) in a way that closely resembles the interactions 
observed in the co-crystallized structure of the same compound with 
HKU4-CoV 3CL pro (4YOI) (Figure 3). The close proximity between 
the ligand's thiophenoyl moiety and the sulfide of Met25 (Figure 3F) 
suggests the presence of a mutual hydrophobic interaction, which 
correlates with mapping the same ring against a Hbic feature in 
Hypo(K-T5-3/N-Tl-l) (Figure 3C). Similarly, mapping the carbonyl 
of the same thiophenoyl moiety against HBA feature in Hypo(K-T5-3/ 
N-Tl-1) (Figure 3C) agrees with the hydrogen bonding interaction 
connecting this carbonyl to the thiol of Cysl45 (Figure 3F). Likewise, 
the hydrogen bonding interaction connecting the amidic NH of the 
ligand to the peptidic carbonyl of His41 via bridging water molecule 
agrees with mapping the same NH against hydrogen bond donor 
(HBD) features in Hypo(K-T5-3/N-Tl-l) (Figure 3F). Mapping the 
ligand's benzotriazole ring against RingArom feature in Hypo(K-T5-3/ 
N-Tl-1) (Figure 3C) correlates with stacking this ring system against 
the peptide amide connecting Cysl45 and Leul44 in the binding 
pocket (Figure 3F). Finally, the hydrogen bonding interaction anchoring 
the ligand's tertiary amide carbonyl to the peptide NH of Glul69 
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corresponds to fitting the same carbonyl against HBA feature in 
Hypo(K-T5-3/N-Tl-l) (Figure 3C). These findings showed that 
Hypo(K-T5-3/N-Tl-l) represents a valid binding mode exhibited by 
the ligands within the binding pocket of HKU4-CoV 3CL pro . These 
interactions, highlighted by the pharmacophoric features within this 
model, are very likely to be critical for ligand-binding affinity. 


3.2 I Structure-based approach: dbCICA modeling 

Structure-based pharmacophore models for HKU4-CoV 3CL pro were 
obtained by using dbCICA. In this approach, a subset of inhibitors 
(1-27, Table SI) were docked into the HKU4-CoV 3CL pro binding 
pocket using LibDock, 47 and CDOCKER 48 (Section 2.2). The highest- 
ranking conformers/poses based on each scoring function were 
aligned together to construct a corresponding dbCICA model. Genetic 
algorithm was then used to search for the best combination of ligand- 
receptor intermolecular contacts capable of explaining bioactivity 
variation across the training compounds. Table 2 shows the contacts 
distance thresholds, number of positive and negative contacts, and 
statistical criteria of the best dbCICA models. Table 3 shows the critical 
binding site contact atoms proposed by optimal dbCICA models. The 
highest-ranking dbCICA models exhibited excellent statistical criteria 
and were anticipated to act as good templates for building correspond¬ 
ing pharmacophore models (Table 2). Figure 4 shows how dbCICA 
model SB-1 (Tables 2 and 3) was converted into its corresponding 
pharmacophore model Hypo(SB-l) as an example. The emergence of 
significant positive contact atoms at Pro45 and HOH225 (Figure 4A) 
combined with the consensus among potent docked ligands to 
position hydrophobic alkyl, cycloalkyl, or aromatic rings nearby 
(within 3.5 A from Pro45 and HOH225, Figure 4C) prompted us to 
place Hbic feature onto these ligand groups (Figure 4D). It is likely that 
hydrophobic fragments of the ligands interact with the side chain of 
Ala46. 

Similarly, the emergence of the amidic NH of Glnl92 as significant 
positive contact in SB-1 combined with agreement among docked 
potent training compounds on placing their central benzene rings near 
to this contact suggested placing an Hbic feature onto these benzene 
ligand fragments. Clearly, these rings are involved in hydrophobic 
interaction with the nearby thiol of Cysl45 instead of n-stacking 
(as the nearest aromatic amino acid residue is His41 at about 4.5 A 
away). This explains our decision to place Hbic feature onto this region 
of the ligands (ie, rather than RingArom feature). 

Likewise, the appearance of Hisl66 and HOH241 as positive 
contact points combined with agreement among potent hits to 
position their benzotriazoles close by suggested placing a hydropho¬ 
bic aromatic (HbicArom) feature onto these benzotriazole moieties 
(Figure 4E). The reason for adding an HbicArom feature onto these 
rings instead of a vectored RingArom feature is because the 
benzotriazoles, although docked near to the imidazole of Hisl66, it 
did not exhibit typical n-stacking alignment with this residue. In 
contrast, the appearance of positive contacts at His41 and ASP190 
combined with a consensus among docked potent inhibitors to pro¬ 
ject their thiophene rings close to the nearby imidazole of His41 
suggests a mutual n-stacking interaction involving the electron-rich 
ligands' thiophenes and electron-deficient His41 imidazole. This 
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TABLE 2 The highest ranking dbCICA models and their corresponding parameters and statistical criteria 3 


Model 

Docking Engine 

Scoring Function 

Positive Contacts 11 

Negative Contacts 0 

r 2 d 

• 27 

r 2 e 

' LOO 

r 2 f 

' 5-fold 

F statistic 

SB-1 

CDOCKER 

PMF 

9 

10 

0.92 

0.91 

0.91 

291.39 

SB-2 

CDOCKER 

PMF 

5 

5 

0.88 

0.86 

0.83 

180.4 

SB-3 

LibDock 

PLP2 

5 

10 

0.90 

0.88 

0.87 

221.48 

SB-4 

LibDock 

PLP2 

8 

5 

0.91 

0.89 

0.89 

239.61 

SB-5 

LibDock 

Lig2 

5 

5 

0.86 

0.84 

0.84 

147.68 


Abbreviation: dbCICA, docking-based comparative intermolecular contacts analysis. 

a AII successful models listed herein were generated by docking the ligands into the binding site in the presence of crystalographically explicit water mole¬ 
cules and at ligand/binding site contact distance thresholds of 3.5 A (Section S12). 

b Optimal number of combined (ie, summed) bioactivity-enhancing ligand/binding site contacts. 
c Optimal number of bioactivity-disfavoring iigand/binding site contacts. 
d Non-cross-validated correlation coefficient for 27 training compounds. 

6 Cross-validation correlation coefficients determined by the leave-one-out technique. 
f Cross-validation correlation coefficients determined by the leave-20%-out technique repeated 5 times. 


TABLE 3 Critical binding site contact atoms proposed by optimal dbCICA models 



Favored Contact Atoms 



(Positive Contacts) 13 



dbCICA 

Amino acids and 


Disfavored Contact Atoms 

Model 3 

atom identities 0 

Weights' 1 

(Negative Contacts) 6 


ASP190:CB 

2 

CYS145:CB; CYS145:HB2; GLN167:0; GLN192:HA; GLN192:HG1; 
LEU144:C; LEU144:HD22; MET168:SD; HOH216:Hl; HOH234:Hl 

SB-1 

CYS145:HB1 

1 



GLN192:HE21 

2 



GLU169:HN 

2 



HIS166:NE2 

3 



HIS41:CB 

1 



PR045:CA 

1 



HOH225:Hl 

3 



H0H241:0 

3 


SB-2 

PR045:CA 

1 

LEU144:C; LYS191:HN; MET168:SD; MET25:SD; CYS145:HG 


ASP190:O 

3 



GLU169:OEl 

3 



HIS166:NE2 

1 



PHE143:C 

2 


SB-3 

ASP190:C 

3 

CYS44:HB1; CYS44:HB2; GLN195:HB1; HIS41:0; LYS191:C; LYS191:HN; 


HIS194:HN 

1 

MET25:CG; MET25:N; PR052:HD1; HOH116:Hl 


MET168:HB2 

3 



PHE143:CA 

3 



SER24:HB2 

2 


SB-4 

ASP190:C 

3 

GLN192:CD; GLU169:0; LEU49:CG; LEU49:HB2; MET168:HE2 


HIS41:HD2 

3 



LEU144:Ha 

2 



MET168:HB2 

3 



MET168:SD 

2 



PHE143:C 

1 



THR193:N 

3 



H0H217:0 

2 


SB-5 

ALA46:CB 

2 

ASP190:CB; CYS44:HB2; GLN167:0; HIS175:CD2; THR193:C 


ASP190:C 

1 



PHE143:0 

2 



PR052:HG2 

3 



HOH401:H1 

3 



Abbreviation: dbCICA, docking-based comparative intermolecular contacts analysis. 
a As in Table 2. 

b Bioactivity-proportional Iigand/binding site contacts. 

c Binding site amino acids and their atomic contacts. Atom codes are as provided by the PDB file except for hydrogen atoms, which were coded by Discovery 
Studio. 

d Degree of significance (weight) of corresponding contact atom. It points to number of times it emerged in the final dbCICA model (see Section S12). 
e Bioactivity-disfavoring Iigand/binding site contacts. 
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PR045 


PR045 


HOH225] 


HOH22? 


ALA 46 


MET23 


MET25 


GLNI92 


OLN192 


OLNI6- 


01-NI6T 


GLN16" 


CYSI45 


CYS143 


CYSI43 


MET 168 


HBI66 


LEUI44 


PR045 


PROI5 


ALA 46 


ALA 46 


met:' 


.ASPI90 


OEM 192 


mr 

Mftk* 


OLNI*2 




OWM« 




FIGURE 4 Steps used in the manual generation of binding model Hypo(SB-l) as guided by dbCICA model SB-1 (Tables 2 and 3): (A) The 
binding site moieties selected by dbCICA model SB-1 with significant contact atoms shown as spheres. (B) The docked pose of the potent 
training compound 3 (IC 50 = 1.2pM) within the binding pocket. (C) The docked poses of the potent compounds 3, 4, 5, 6, and 8. (D) Manually 
placed pharmacophoric features onto chemical moieties common among docked potent compounds 3, 4, 5, 6, and 8. (E) The docked pose of 3 
and how it relates to the proposed pharmacophoric features. (F) Exclusion spheres fitted against binding site atoms showing negative 
correlations with bioactivity (dbCICA model SB-1). Green vectored spheres: HBA, blue spheres: Hbic, violet spheres: HbicArom, and orange- 
vectored spheres: RingArom. Exclusion spheres are shown in gray. dbCICA, docking-based comparative intermolecular contacts analysis; HBA, 
hydrogen bond acceptor 


observation supported placing a RingArom feature onto the thio¬ 
phene rings. 

The emergence of positive contact on the amidic NH of GLN169 
and agreement of docked compounds on placing their central amide 
oxygen close to the NH of GLN169 indicated the presence of 
hydrogen bonding interaction and suggested placing HBA feature onto 
the ligand amidic carbonyl groups (Figure 4E). This interaction is very 
likely to involve hydrogen bonding with the peptide amidic NH of 
GLU169. 

Finally, all contacts points of negative correlation with bioactivity 
were assumed to represent areas of steric clashes with the bound 
ligand. Therefore, such contacts were used to define exclusion 
volumes within the vicinity of the binding pocket, as shown in 
Figure 4E. 

The same strategy was used to translate all other optimal dbCICA 
models in Tables 2 and 3 into their corresponding pharmacophore 
models (Figure 5). The X, Y, and Z coordinates of the resulting 
pharmacophores are shown in Table S7. Subsequent validation using 
ROC analysis (Table 4).illustrated the excellent classification powers 
of these pharmacophores in distinguishing actives from decoys. 
Matthews correlation coefficient values indicate that the structure- 


based dbCICA models are superior in their classification ability to the 
QSAR-guided pharmacophores. 

3.3 I In silico screening 

The QSAR-guided, sterically refined, merged pharmacophore Hypo(K- 
T5-3/N-T1-1) and 5 dbCICA-based pharmacophores (Hypo(SB-l) to 
Hypo(SB-5)) were used as 3D search queries to screen the NCI virtual 
database for small molecule inhibitors of 3CL pro . Captured hits were 
filtered by the Lipinski criteria 57 and SMARTS filter as described 58 in 
Section 2.4. 

The QSAR-guided hits were fitted against component 
pharmacophores (ie, Hypo(K-T5-3), Hypo(N-Tl-l), and Hypo(L-T5-2)) 
and their fit values were substituted in MLR-QSAR Equations 3 and 
4 to predict their bioactivities. The top 39 compounds (of the 
highest-ranking hits; prioritized using the voting system described in 
Section 2.4) that were available in the NCI Open Chemicals Repository 
were acquired for in vitro testing. 

On the other hand, filtered dbCICA-derived hits were docked into 
HKU4-3CL pro protein using the same docking conditions of each 
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FIGURE 5 dbCICA pharmacophores derived 
from successful dbCICA models in Tables 2 
and 3. (A) Hypo(SB-l) mapped against training 
compounds 5 and 6 (IC 50 = 1.5pM and 1.6pM, 
respectively, (Table SI), (B) Hypo(SB-2) 
mapped against 5 and 6 , (C) Hypo(SB-3) fitted 
against 5 , (D) Hypo(SB-4) mapped against 6 , 
and (E) Hypo(SB-5) mapped against 5 . Green 
vectored spheres: HBA, purple-vectored 
spheres: HBD, blue spheres: Hbic, violet 
spheres HbicArom, and orange-vectored 
spheres: RingArom. Exclusion spheres are 
shown in gray. dbCICA, docking-based 
comparative intermolecular contacts analysis; 
HBA, hydrogen bond acceptor; HBD, 
hydrogen bond donor 


successful dbCICA model (SB1, SB-2, SB-3, SB-4, and SB-5, Tables 2 
and 3) to predict their corresponding inhibitory IC 50 values (Section 
2.4). The hits were ranked and prioritized using the voting system 
described in Section 2.4, and the top 39 compounds were acquired 
for in vitro testing. Thus, the total of 78 compounds from the NCI 
Open Chemicals Repository were acquired for testing. 

3.4 I In vitro validation 

A total of 78 NCI (Figure SI), 39 QSAR-guided derived hits and 39 
dbCICA derived hits, compounds were acquired and screened in vitro 

TABLE 4 ROC and MCC performances of the dbCICA-based 
pharmacophores 


Pharmacophore Model 

ROC-AUC 

ACC 

SPC 

TPR 

MCC 

Hypo(SB-l) 

0.946 

0.495 

0.726 

0.815 

0.241 

Hypo(SB-2) 

0.976 

0.632 

0.944 

0.666 

0.713 

Hypo(SB-3) 

0.932 

0.573 

0.854 

0.666 

0.283 

Hypo(SB-4) 

0.971 

0.615 

0.918 

0.666 

0.384 

Hypo(SB-5) 

0.897 

0.425 

0.611 

0.963 

0.254 


Abbreviations: ACC, overall accuracy; AUC, area under the curve; MCC, 
Matthews correlation coefficient; ROC, receiver operating characteristic; 
SPC, overall specificity; TPR, overall true positive rate. 


to determine their inhibitory activity against HKU4-CoV-3CL pro and 
MERS-CoV-3CL pr ° at 40pM hit concentration. The 3CL pro enzyme 
assay used in this study was carefully designed to avoid misleading 
false positives and to prevent wasted follow-up on promiscuous 
compounds (by adding albumin, DTT, and triton-100 to the reaction 
mixture). Tables S8 and S9 show the %inhibition against 3CL pro of 
the hits captured by the QSAR-guided and the dbCICA derived 
pharmacophores, respectively. 

Only a single compound (NCI code 134140) of the 39 tested hits, 
captured by the QSAR-guided pharmacophores, showed inhibitory 
activity >50% against both HKU4-CoV 3CL pro and MERS-CoV 3CL pro . 
However, this compound has a molecular fragment known to cause 
pan assay interference (PAINS-like; Baell 67 ) and therefore was not con¬ 
sidered as a hit in further characterizations. Three compounds of the 
same ligand-based hits (NCI codes: 12156, 22906, and 28562; Table 
S8) showed unexpectedly high negative values of their activity against 
MERS-CoV 3CL pro (-633.2%, -203.4%, and -662.6% at 40pM; Table 
S8). Several controls were performed in which either the substrate or 
the enzyme or both were omitted from the assay (data not shown). 
None of these hits showed evidence of fluorescence interference. It 
might be possible that these compounds act as activators of the 
enzyme. However, further evidence is still needed to support this 
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hypothesis. It was previously observed that designed reversible 
peptidomimetic inhibitors acted as activators at a low compound con¬ 
centration as a result of induced dimerization. 30 Therefore, these 3 hits 
will not be discussed in the current publication. 

Only a single compound (222; NCI code 120178) of the dbCICA 
derived hits showed inhibitory activity >50% (at 40pM) against 
MERS-CoV 3CL pro (51.9%; Table S9 and Figures 5). This activity is 
comparable to that of the positive control against the MERS-CoV 
enzyme (compound 1; 63.8% Figure 6). However, the compound 222 
failed to show significant inhibitory activity against HKU4-CoV 3CL pro . 
The purity of 222 was confirmed using nuclear magnetic resonance 
and mass spectroscopy (Figure S2). Another compound, 223, was 
found to exhibit a bit lower activity against the MERS-CoV enzyme 
(28% inhibition at 40pM). The purity of 223 was confirmed using 
nuclear magnetic resonance and mass spectroscopy (Figure S3). 
Compounds 222 and 223 (NCI code 128947) share a common 
phenylsulfonamide fragment, which is amenable to chemical 
modifications. Both compounds were captured by Hypo(SB-3) and 


Journal of | 

-Wll FV-^olecular —I llof15 

v v i i R ec 0 g n ition 

Hypo(SB-5) pharmacophores (Table 4). Figure 7 shows how 222 hit 
maps the dbCICA pharmacophore models. 

Further controls were conducted (same as described above) to 
rule out fluorescence interference. None of these hits showed 
significant flourescence in the assay buffer (no enzyme and no 
substrate), in the presence of the enzyme (no substrates) or in the 
presence of the substrate (no enzyme) (data not shown). However, 
at concentrations >100pM, 222 showed approximately 10% 
attenuation of the cleaved substrate fluorescence (data not shown). 
Both 222 and 223 showed moderate apparent IC 50 values against 
the MERS-CoV 3CL pro of 98.7pM and 131.1pM, respectively 
(Figure 5). The shape of the activity curve of compound 222, where 
a linear inhibition of fluorescence up to a maximum inhibition, 
indicates the influence of the inner filter effect (Figure S4). 68 69 Inner 
filter effect is one of the major challenges usually encountered in 
FRET-based enzyme assays. 69 

The low hit rate observed in this study can be justified by the 
limited availability of many of the top-ranked hits in the NCI Open 



% inhibition at 40 |iM 
(MERS-CoV 3('I. |,r ’> 
Apparent IC^IpM) 
(MERS-CoV 3Cl. pro ) 


5I.9±7 


2* ±7 


98.7 * 6.0 
Hill. n = 1.7 ±0.1 


131.1 ±4.8 
Hill. n = 2.8 ±0.2 


FIGURE 6 The chemical structures, inhibitory activities, and apparent IC 50 values of the positive control 1, and the 2 tested hits captured by the 
dbCICA-derived pharmacophores (222 and 223). dbCICA, docking-based comparative intermolecular contacts analysis 



FIGURE 7 dbCICA-based pharmacophores derived from successful dbCICA models (Tables 2 and 3) mapped against hit compound 222. (A) Hypo 
(SB-1), (B) Hypo(SB-2), (C) Hypo(SB-3), (D) Hypo(SB-4), and (E) Hypo(SB-5). Green-vectored spheres: HBA, purple-vectored spheres: HBD, blue 
spheres: Hbic, violet spheres: HbicArom, violet spheres: HbicArom, and orange-vectored spheres: RingArom. Exclusion spheres are shown in gray. 
dbCICA, docking-based comparative intermolecular contacts analysis; HBA, hydrogen bond acceptor; HBD, hydrogen bond donor 
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Chemicals Repository and hence limited number of tested hits 
(only 78 hits). 

There was also a limitation in the availability of published potent 
MERS-CoV 3CL pro inhibitors to be used as training set in modeling 
enzyme inhibition. Obviously, the prediction ability of the computa¬ 
tional models is very much dependent on the compounds used in 
modeling. The training compounds used in the current study are all 
peptide-like compounds, and only 14% of them exhibited IC 50 values 
<10pM. The effect of the starting training set was prominent on the 
ligand-based modeling (QSAR-guided model), hence, explaining the 
poor quality of these models as indicated by their low MCC values. 
Clearly, the quality of the training set is a pivotal factor in 
determining the predictive validity of the obtained pharmacophores. 
It is also worth noting that the active site-directed design of 
nonpeptidomimetic small molecule inhibitors of proteases is often 
challenging because of the unique chemistry of the peptide-bond 
cleavage transition state and because some proteases cleave their 
substrates through an induced fit mechanism. 70 

4 I CONCLUSIONS 

Recently, special attention has been paid to bat coronaviruses. Two 
deadly emerging coronaviruses, which have caused unexpected human 
disease outbreaks, SARS-CoV and MERS-CoV, are suggested to be 
originated from bats. MERS-CoV is now considered a threat to global 
public health. While its human-to-human transmission is so far limited, 
serious concerns over its pandemic potential have been raised. 
Therefore, there is an urgent need for the development of effective 
and safe anti-MERS-CoV treatment. 

In this study, we have explored the pharmacophoric space of the 
recently identified peptidomemic HKU4-3CL pro inhibitors 36 by 2 
independent approaches; the QSAR-guided pharmacophore modeling 
and the dbCICA-based pharmacophore construction. Both approaches 
have successfully resulted in the identification of novel potent 
inhibitors on a wide variety of targets. QSAR-guided pharmacophore 
modeling is a ligand-based method, in which pharmacophores are 
derived by extensive exploration of the 3D space of a carefully 
selected variable small subset of the inhibitors. These pharmacophores 
are then allowed to compete within the context of classical QSAR 
using GFA and MLR to identify combinations that result in finest 
estimation of the bioactivities. dbCICA modeling, on the other hand, 
is a structure-based pharmacophore construction method, which relies 
on the accurate selection of the most successful docking/scoring 
conditions combinations. The success criterion is the ability of the 
docking run to align potent ligands in a way that would allow them 
to form contacts unattainable by low-potency ligands. dbCICA can be 
considered a 3D QSAR that correlates ligands' affinities to their 
contacts with certain binding site spots by using GFA and MLR. 
Successful dbCICA models can then be translated into binding models 
(pharmacophores) to be used as in silico screening tools of virtual 
databases. 

We have applied these robust computational methods to model 
HKU4-CoV 3CL pro inhibitors as a tool to identify inhibitors of 
MERS-CoV 3CL pro . These models assisted the identification of 2 hit 
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compounds with moderate apparent activity against MERS-CoV 
3CL pro . The identified inhibitors share a novel nonpeptidomimetic 
scaffold that is amenable to medicinal chemistry optimization efforts. 
Despite the fair inhibitory activity of this scaffold, it represents a 
potential starting point in the discovery of novel MERS-CoV antivirals. 
There are several successful examples in the history of drug discovery 
in which the starting hits showed low-to-moderate enzyme inhibition. 
For example, the millimolar inhibitor Neu5Ac was the starting point in 
the development of zanamivir, the first influenza neuraminidase 
inhibitor introduced to the market. 71 

Most importantly, the established ligand-based and structure- 
based pharmacophore models aid as tools for advancing our 
understanding of small molecule recognition of the coronavirus 3CL pro 
enzymes. The pharmacophores obtained by modeling the HKU4-CoV 
3CL pro inhibitors revealed structural features needed for potent 3CL pro 
enzyme inhibitors design. While, dbCICA models (structure-based 
models) highlighted potential hot-spot regions in the 3CL pro pocket 
that could be targeted using small nonpeptidomimetic molecules. Such 
knowledge is valuable for the successful development of 3CL pro 
inhibitors as anti-MERS drugs. 
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